Auxiliary Networks for Joint Speaker Adaptation and Speaker Change Detection

نویسندگان

چکیده

Speaker adaptation and speaker change detection have both been studied extensively to improve automatic speech recognition (ASR). In many cases, these two problems are investigated separately: is implemented first obtain single-speaker regions, then performed using the derived segments for improved ASR. However, in an online setting, we want achieve goals a single pass. this study, propose neural network architecture that learns embedding from which it can perform ASR detection. The proposed computed self-attention based on auxiliary attached main network. by subtracting, activations, segment dependent affine transformation of learned embedding. experiments broadcast news dataset Switchboard conversational dataset, test our system utterances with point them show method achieves significantly better performance as compared unadapted (10-14% relative reduction word error rate (WER)). also outperforms three different segmentation methods followed (around 10% WER).

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On-line incremental speaker adaptation with automatic speaker change detection

In order to improve the performance of speech recognition systems when speakers change frequently and each of them utters a series of several sentences, a new unsupervised, online and incremental speaker adaptation technique combined with automatic detection of speaker changes is proposed. The speaker change is detected by comparing likelihoods using speaker-independent and speaker-adaptive GMM...

متن کامل

Joint Environment and Speaker Adaptation

In this paper we address the problem of speaker adaptation in noisy environments. We aim at estimating speaker adapted models from noisy data by combining unsupervised speaker adaptation with model-based noise compensation. Speaker adapted models obtained with this method should contain as little information about the environment as possible, so that they can be reused in different environments...

متن کامل

Speaker change detection using joint audio-visual statistics

In this paper, we present an approach for speaker change detection in broadcast video using joint audio-visual scene change statistics. Our experiments indicate that using joint audio-visual statistics we achieve better recall without loss of precision as compared to purely audio domain approaches for speaker change detection.

متن کامل

Variational Bayesian speaker change detection

In this paper we study the use of Variational Bayesian (VB) methods for speaker change detection and we compare results with the classical BIC solution. VB methods are approximated learning algorithms for fully bayesian inference that cannot be achieved in an exact form. They embed in the objective function (also known as free energy) a term that penalizes more complex models. Experiments are r...

متن کامل

A DP algorithm for speaker change detection

The Bayesian Information Criterion (BIC) is a widely adopted method for audio segmentation; typically, it is applied within a sliding variable-size analysis window where single changes in the nature of the audio are locally searched. In this work, a dynamic programming algorithm which uses the BIC method for globally segmenting the input audio stream is described, analyzed, and experimentally e...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE/ACM transactions on audio, speech, and language processing

سال: 2021

ISSN: ['2329-9304', '2329-9290']

DOI: https://doi.org/10.1109/taslp.2020.3040626